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Abstract. In this paper, we define a new kind of weighted tree automata where the weights are only 
supported by final states. We show that these automata are sequentializable and we study their closures 
under classical regular and algebraic operations. 

We then use these automata to compute the subtree kernel of two finite tree languages in an efficient 
way. Finally, we present some perspectives involving the root-weighted tree automata. 


1 Introduction 

Kernel methods have been widely used to extend the applicability of many well-known algorithms, such 
as the Perceptron [Tj, Support Vector Machines [S], or Principal Component Analysis m . Tree kernels 
are interesting approaches in areas of machine learning based natural language processing. They have been 
applied to reduce such effort for several natural language tasks, e.g. relation extraction nni, syntactic parsing 
re-ranking j3], named entity recognition |6l7) and Semantic Role Labeling [12) . 

The main idea of tree kernels is to compute the number of common substructures (subtrees and subset 
trees) between two trees ti and t 2 - In [14], Moschitti defined an algorithm for the computation of this type 
of tree kernels which computes the kernels between two syntactic parse trees in 0{m x n) time, where m and 
n are the number of nodes in the two trees. To do this, Moschitti modified the function proposed by Collins 
and Duffy in |3] by introducing a parameter a G {0,1} which enables the SubTrees {a = 1) or the SubSet 
Trees (cr = 0) evaluation and which is defined for two trees and t 2 as follows: Given a set of substructures 
S = {si, S2i ■ • ■}; tli6y defined the indicator function Ii{n) which is equal to 1 if the substructure Si is rooted 
at node n and 0 otherwise. They defined the tree kernel function between the two trees and ^2 as follows: 
K{ti,t 2 ) = X^rtiGiVt Srt 2 GiVt ^(^1; *^ 2 ) where Nt-^ and Nt^ are the number of nodes in ti and t 2 respectively 

and Z\(ni,n 2 ) = X]l=i ' ^ 1 (^ 2 )- We can then compute A as follows: 

— if the productions at ni and n 2 are different then Z\(ni,n 2 ) = 0, 

— if the productions at ni and n 2 are the same and ni and n 2 are leaves then Z\(ni,n 2 ) = 1, 

— if the productions at ni and n 2 are the same and ni and n 2 are not leaves then A{ni , ^ 2 ) = + 

A{Cl^ , where nc(ni) is the number of children of m and is child of the node n. 

In [T3|, Moschitti proposed a new convolution kernel, namely the Partial Tree kernel, to fully exploit 
dependency trees. He proposed an efficient algorithm for its computation which is based on applying the 
selection of tree nodes with non-null kernel. In the following we propose a new technique to compute these 
kind of tree kernels using weighted tree automata. We will start by defining a new class of weighted tree 
automata that we call rooted weighted tree automata and we will prove some properties of these weighted 
tree automata. Then we will show that tree kernels can be computed efficiently using a general intersection 
of rooted weighted tree automata defined here. 

The paper is organized as follows. In the following section, we introduce the trees, operations in trees 
and in tree languages and some preliminary notions used in the remaining sections. Section |31 presents the 
sequentialization of rooted weighted tree automata and the closure of these automata under rational or 
algebraic operations. In Section)?] we present an efficient computation of the subtree kernel of two finite tree 
series. Finally, the different results described in this paper are given in the conclusion. 


2 Preliminaries 


Let 17 be a graded alphabet. A tree t over S is inductively defined t = f(ti ,..., tk) where k is any integer, 
/ is any symbol in Sk and ti,... ,tk are any k trees over E. We denote by Ts the set of trees over E. A tree 
language over 17 is a subset of T^- 

Let c be a symbol in 17o, L be a tree language over E and t be a tree in T^. The tree substitution of c 
by L in t, denoted by is the language inductively defined by: 

— L \it = c\ 

— {d} if t = d G 17o \ {c}; 

— f{tl{c^^L}, ■ ■ ■ if t = /(ii, ■ ■ • ,4) with f € Ek and 4, ■ • ■ ,4 any k trees over 17. 

The c-product Li L 2 of two tree languages Li and L 2 over E is the tree language Li L 2 defined by 
UteLi ^{c<-L 2 }- iterated c-product of a tree language L over E is the tree language 4"“ recursively 

defined by: 

— = {c}, 

— L("+I)e = UL 

The c-closure of the tree language L is the language 4*“= defined by Ura>o 

In the following, we make use of weighted tree automata in order to compute tree kernels. See |4] for 
details about classical tree automata. 

Let t be a tree over an alphabet E. The tree t^ is obtained by indexing the symbols of t by its position 
in a prefix course. We denote by 17(# the set of the indexed symbols that appears in tK The function h is the 
dual function, which drops the indexes (h(t^) = t). Notice that the function h defines an equivalence relation 
over . Indeed, let 4 and ^2 be two trees in . We define the relation by 4 ^2 ^ h(4) = h(4)- 

Since is a relation based on the equality of images by h, it can be shown that 

Lemma 1. The relation is an equivalence relation. 

Let E be an alphabet and t = /(4, • ■ ■, 4) be a tree in Ts- 

The set SubTree(t) is the set inductively defined by SubTree(t) = {t} U Ui<j<fe SubTree(4)- Let L be a 
tree language over E. The set SubTreeSet(L) is the set defined by SubTreeSet(Xy = SubTree(t). 

The formal tree series SubTreeSeriesj is the tree series over N inductively defined by SubTreeSeriest = 
t + J2i<j<k SubTreeSeriest^.. Let 4 be a finite tree language. The formal tree series SubTreeSeriest is the 
tree series over N defined by SubTreeSeriest = X^t'et SubTreeSeriest'. Let us notice that if L is not finite, 
since A is a finite set of symbols, there exists a tree t in Eq that appears an infinite times as a subtree in 4; 
thus SubTreeSeriest is a tree series over N U {+ 00 }. 

Definition 1. Let 4i and L 2 be two finite tree languages. The subtree series kernel of Li and L 2 is the 
integer KerSeries^Li, L 2 ) defined by: 

KerSeries(4i, 42 ) = (SubTreeSeriesti x SubTreeSeriest 2 )(i)- 

Example 1. Let E be the graded alphabet defined by Eq = {a, &}, Ei = {h} and E 2 = {/}■ Let us consider 
the trees 4 = /(h(a),/(/i(a), &)), ^2 = f{h{a),h{b)) and t^ = f{f{b,h{b)),f{h{a),h{b))). Then it can be 
shown that: 

— SubTree(ti) = {ti, f{h{a),b), h{a),a, b} 

— SubTree(t 2 ) = {^ 2 , h{a),h{b), a, b} 

— SubTreeSeriest^ = Pti = t + f{h{a), b) + 2h{a) + 2a + 6 

— SubTreeSeriests = Pt 2 = ^2 + h{h) + h{a) + a + & 

— SubTreeSeriesta = Ptj = ta + /(&, h{b)) + ^2 + 2/i(6) + h{a) + 36 + a 

— SubTreeSerieS{ti,t 2 } = lP{ii.t 2 } = + Pi 2 = ^ + 4 + f{h{<^), b) + 36(a) + 6(6) + 3a + 26 

P|tijt 2 } X P^tg} = ^2 4 26(6) + 36(a) -t- 66 + 3a 

— KerSeries({4,4}, {4}) = 15 
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3 Tree Series and Root-Weighted Tree Automata 


A formal tree series EEI P over a set S' is a mapping from Ts to S. Let M = {M, +) be a monoid which 
identity is 0. The support of P is the set Support(P) = {t G \ P(f) ^ 0}. Any formal tree series can be view 
as a formal sum P = (^(^)d)- In this case, the formal sum is considered associative and commutative. 

Formal tree series can be realized by weighted tree automata. Weighted tree automata were defined over 
semirings [H] or multioperator monoids m- In this paper, we use particular automata, the weights of which 
belong to a monoid or a semiring, and only label the finality of states. Consequently, the automata we use 
are a strcit subclasses of weighted tree automata, with particular properties. 


3.1 Root-Weighted Tree Automata 


Definition 2. Let M = (M, +) be a commutative monoid. A M-Root Weighted Tree Automata (M.-RWTA) 
is a 4-tuple (A, Q, v, S) where: 


— S = Afe is a graded alphabet, 

— Q is a finite set of states, 

— V is a function from Q to M called the root weight function, 

— S is a subset of Q x A^ x Q^, called the transition set. 


When there is no ambiguity, a M-RWTA is called a RWTA. 

The root weight function i/ is extended to 2*^ —> M for any subset S' of Q by v{S) = X^seS ^(®)' The 
function i/ is equivalent to the finite subset of Q x M defined for any couple {q, m) in Q x M by {q, m) & v 
v{q) = m. 

The transition set <5 is equivalent to the function from A*, x to 2^ defined for any symbol / in A^ and 
for any fc-tuple (gi,..., qk) in by g G 5{f, qi,..., qk) (q, /, qi,..., q^) G 5. The function 5 is extended 
to Afe X {2^)^ —>■ 2*^ as follows: for any symbol / in A^, for any fc-tuple {Qi,... ,Qk) of subsets of Q, 
6{f,Qi,...,Qk) = U(gi...., 9 fc)eQix-xQfcFinally, the function A is the function from Ts to 
2^ defined for any tree t = f{ti,... ,tk) in by A{t) = S(f, A{ti ),..., A{tk))- 

A weight of a tree associated with A is v{A{t)). The formal tree series realized by A is the formal tree series 
over M denoted by Pa and defined by PA(t) = v{A{t)), with v(%) = 0 with 0 the identity of M. 


Example 2. Let us consider the graded alphabet A defined by Aq = {a}, Ai = {h} and A 2 = {/}. Let 
M = (N, +). The RWTA A = (A, Q, i/, defined by 


- Q = {1,2,3,4,5}, 

- = {(1,0), (2,3), (3,1), (4,2), (5,4)}, 

- 5 = 1(1, a), (3, a)(2, /, 1, 3), (4, /, 3, 3), (5, h, 2), (5, h, 4), (5, h, 5)}, 

is represented in Figured] and realized the tree series: 

Pa = a + 5/(a, a) + 4/i(/(a, a)) + 4h(h(f(a, a))) H-h 4h(h(... h{f{a, a))...)) H-. 
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Fig. 1. The RWTA A. 


Let Ai = (U, Si) and A 2 = {F, Q 2 , 1 ^ 2 , S 2 ) be two RWTAs. A function /i is a morphism of RWTA 

from Ai to A 2 if: 

- Vg e Qi, fi(q) G Q 2 , 

- V/ G Fk, fJ-if) G Ifc, 

- V(g,/,gi,...,gfc) G 5i, {p,{q), n{f), fi{qi),..., fi{qk)) G 62 , 

- Vg G Qi, i' 2 {h‘{q)) = vi{q). 

A morphism fi from Ai to A 2 is said to be an isomorphism if there exists a morphism from A 2 to Ai. In 
this case, Ai and A 2 are said to be isomorphic. It can be shown by induction over the structure of any tree 
t in Ts that if Ai and A 2 are isomorphic w.r.t. a morphism fi then Z\ 2 (/i(t)) = U5ezii(t){M(9)}- Therefore 

Lemma 2. Let Ai be a RWTA over an alphabet S. Let A 2 be a RWTA isomorphie to Ai w.r.t. a morphism 
fi. Then for any tree t in T^j, 


As a direct corollary, it holds 

Corollary 1. Two isomorphic RWTAs over the same alphabet realize the same tree series. 


3.2 RWTA Sequentialization 

The RTWA A is said to be sequential if and only if for any tree t in Ts, Card(Z\(t)) < 1. Unlike the case of 
classical weighted tree and word automata, the RWTAs are sequentializable. 

Theorem 1. For any RTWA A, there exists a sequential RTWA A' such that P^i = ^a' ■ 

In order to prove Theorem [1] let us define the subset construction [15] for any RWTA. 

Definition 3. Let A = (U, Q, z/, J) he a RWTA. The sequential RWTA associated with A is the RTWA 
A' = {E.,2^,5') defined by: 

-yS CQ, F{S) = 

- V/ G Ek, VQi, ...,QkCQ, S'{f, Qi,..., Qn) = {S{f, Qi,..., Qk)}- 

Notice that F is equal to the extension of v over the subsets of Q. However, S' is not equal to the extension 
of S over sets since it necessarily returns a singleton. 
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Lemma 3. Let A = (U, Q, i/, S). Let A! = {E, 2^, v', 6 ') be the sequential RWTA associated with A. For any 
tree t in T^, 

A'{t) = {A{t)}. 

Proof. By definition of A', A'{f(ti,.. .,tk)) = S'{f,A'{ti),.. .,A'{tk)). 

1 . If fc = 0, then A'{f) = S'{f). Moreover, by definition of A'^ S'{f) = {<5(/)}- Since by definition of A, 

^(/) = ^(/): it holds that A'{f) = {A{f)}. 

2 . Suppose that k ^ 0. According to induction hypothesis, it holds that A'{f{ti ,..., tk)) = S'{f, {Z\(ti)},..., {A{tk)}). 
By definition of S', S'{f, {A{ti)},..., {A{tk)}) = {S{f,A{ti),...,A(tk))}, that equals by definition 

Hence A'{t) = {Z\(t)}. 


Proposition 1. Let A he a RWTA and A' be the sequential RWTA associated with A. Then: 

A' is a sequential RTWA that realizes P^. 

Proof. Let A = {E,Q,v,S) and A' = {E,2^,i/',5'). Let t = fifi,... ,tk) be a tree in E. According to 
LemmalU A'it) = {A{t)}. As a direct consequence, Card(Zi'(t)) = 1 (since the state 0 may be reached) and 
’pA'it) = v'{A'{t)) = v'{A{t)) = P^(t). Hence A' is a sequential RTWA that realizes P^. g 

Example 3. Let us consider the RWTA defined in Example [5] The sequential RWTA associated with A is 
represented in Figure [5J 
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Fig. 2. The sequential RWTA associated with A. 


Since a sequential RWTA is a RTWA, the set of tree series realized by a RTWA is closed under sequen- 
tialization, whatever the set of weights is. Let us now show that this set is also closed under several algebraic 
operations. 


3.3 Sum and Product Closures 

If (M, +) is a commutative monoid, then the set of tree series over M realized by a RTWA is closed under 
the sum. 
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Definition 4. Let Ai = {S, Si) and A 2 = {S, Q 2 , 1 ^ 2 , S 2 ) be two RWTAs such that Qi fl Q 2 = 0- The 

RWTA Ai + A 2 is the RWTA A' = {E, Qi U Q 2 , v'^ (5i U <52) where v' is the function defined for any state q 
in Qi U Q 2 by: 


ifq&Qi, 

otherwise, 


Notice that if Qi and Q 2 are not disjoint, then Q 2 can be changed using an isomorphism. 


Proposition 2. Let Ai and A2 be two RWTAs. Then for any tree t in T^: 

^Ai+A2it) = ]PAi(0 + ]Pa2(0- 

Proof. Let Ai = {E,Qi,1^1,61), A2 = {E,Q2,V2,S2) and AL = AiP A2 = {E ,Q' ,v' ,S'). Let t = ffti ,... , 4 ) 
be a tree in Ts. Let us first show by induction over the structure of t that Aft) = Ai{t) U A2{t). By 
definition, A'{f{ti,..., 4 )) = Sff, Afti),..., A'{tk)). 

1 . if fc = 0 , then A'{f) = 6'{f). By dehnition of A!, S'{t) = 6i{t) U S2{t) that equals by definition to 
Z\i(/) U A2if). Hence Aft) = Ai{t) U A2{t). 

2 . li k f 0 , then by induction hypothesis, Z\'(/(4,..., 4)) = Sff,Ai{ti) U Z\2(4), • • •, ^i( 4 ) UZ\2(4))- 
Since Qi and Q2 are disjoint, there is no transition (g,/, <71,..., g„) in A' such that there exists two 
integers i and j such that qi G Qi and qj G Q2. Therefore Sff, Z\i( 4 ) U Z\2(4), ■ • ■, ^i( 4 ) U Z\2(4)) = 
( 5 i(/, Aifi),..., Ai{tk))u62{f, ^2(4), ■ • ■, ^2(4)), that is equal to Ai{f(ti,..., 4 ))UZ\ 2 (/( 4 , • ■ •, 4 ))- 
Hence Aft) = Ai{t) U A2{t). 

As a direct consequence, = vfAif) U A2{t)) = :^i(Z\i(t)) + U2{A2{t)) = PAi(t) + PA2(t)- ■ 


A semiring is a 5-tuple IK = {K, -b, x, 0,1) such that: 

— {K, -b) is a commutative monoid the identity of which is 0, 

— {K, *) is a monoid the identity of which is 1, 

— 0xa = Q;x0 = 0for any a in K, 

— X distributes over -b. 


In the following, we consider trees over an alphabet E and tree series over the semiring K. From this 
structure, another stable operation can be defined for formal tree series over K. 

Let Pi and P 2 be two tree series. The product of Pi and P 2 is the series Pi x P 2 defined for any tree t by 
Pi X P 2 (f) = Pi(^) X P 2 (t)- Let us show now that the product can be performed via RWTAs. 

Definition 5. Let Ai = {E,Qi,vi, 6 i) and A 2 = {E,Q 2 , 122 , 62 ) be two RWTAs. The RWTA Ai x A 2 is the 
RWTA A! = [E,Q' = Qi x Q 2 ,vfS') defined by: 

- V/ G Ek,Vqi = {qi^,q 2 f,...,qk = (gifc,g 2 j G Qf Sff,qi,...,qk) = (5i(/, gii, • • ■, <71 Jx,52(/, g 2 i, • ■ •, 92 J, 

- \/q= (91,92) G Q', lyfq) = I 2 i{qi) X 1^2(92)- 


Lemma 4. Let Ai = (E,Qi,i2i,Si) and A2 = {E,Q2,122,62) be two RWTAs. Then for any tree t in Ts: 

Aft)=Ai{t) X Z\2(t). 

Proof. By induction over the structure of t = /(ti,... ,4)- By dehnition, Afffi ,..., 4)) = Sff, Afti ),..., Z\'(4))- 

1. if fc = 0, then Aff) = Sff). By dehnition of A', Sft) = Si{t) x S2{t) that equals by dehnition to 
Z\i(/) X Z\2(/). Hence Aft) = Aif) x A2{t). 

2. If fc 7^ 0, then by induction hypothesis, Z\'(/(4, • • • ,4)) = Sff, Aifi) x Z\2(ti),..., Ai(4) x Z\2(4))- 
According to the dehnition of S', 

S'{f,Ai{ti) X A2{ti),...,Ai{tk) X Z\2(4)) = U9i=(9i..92,)e/iife)xA2fe).i<j<fe'^'(/’9 i’---’ 9fe)- 
By dehnition of A', S'{f,qi,... ,qk) = (5i(/, 91^,..., 91J x (52(/, 92i, ■ • ■, 92 J, for any q^ = (91,,92,) G 
Z\i(4) X ^2(4)1 ^ E j Ek. Furthermore, by dehnition of the cartesian product of set, 

^ {i) ~ Ug,- = ((}i, iqin ■ ■ ■ ^qi-k) X 1^2 (/) 92i, • ■ • j 92^ ) 

~ Ug,g/li(tj),l<j<fc bl{f, 9l) • • • ) 9fe) X U(l,G/l2(tj),l<i<fc)(52(/,9l,...,9fc) 
that is equal to 6i{f,Ai{ti ),... ,Z\i(4)) x S2{f,A2{ti ),... ,Z\2(4)) = Ai{t) x A2{t) by dehnition. 
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Proposition 3. Let Ai and A 2 be two RWTAs. Then for any tree t in T^: 

^AixA2(J') — ^ ^A2it)- 

Proof Let Ai = {E, Qi,iyi,Si), A 2 = {E, Q 2 , 1 ^ 2 , 62 ) and A' = AiX A 2 = {E, Q', v' ,5'). Let t = f{ti,... ,tk) 
be a tree in Ts- From Lemma 01 A'{t) = Ai{t) x A 2 (t). Hence Pa'(^) = k''{Ai(t) x A 2 {t)). By defini¬ 
tion of v', FA’it) = J2qieAi{t),q2eA2{t)^ii9i) X i' 2 iq 2 )- Since K is a semiring, by distibutivity, PA'(i) = 
(Sgiezii(t) ^ 1 ( 91 )) X {J 2 q 2 eA 2 it)^ 2 {q 2 ))- Therefore, according to the definition of Ai and A 2 ), P^'(t) = 

I^liAi(t)) X V2{A2{t)). B 

Example 4- Let us consider the RWTA A defined in Example 0] and let A' be the RWTA represented in 
Figure 01 The sum A + A' is represented by the juxtaposition of Figure [H and Figure 01 and the product 
A X A' is represented in Figure HI 



Fig. 4. The RWTA A x A'. 

Notice that series realized by RWTAs are not necessarily closed under classical regular operations. 


3.4 Case of the a-Product 

Let a be a symbol in An. The a-vroduct of Pi and Pn is the series Pi ■„ P 2 defined for any tree t by 
Pi -a P 2 W = Et,,t 2 ^T.,t=t,.^t 2 X 

Let us show that the a-product of two series realized by some RTWAs may not be realized by any RTWA. 
The image of a tree series P is the set Im(P) = {a & K \ 3t € Ts, P(t) = a}. 

Lemma 5. Let A be a RWTA. Then: 

Im(P^) is a finite set. 

Proof. Let A = {E,Q,i 2 ,S). By definition, for any tree t in T^, PA(t) = J2qGA{t) ^(q)- Consequently, P^(t) 
belongs to the subset {a € AT | 35” C Q,a = i^(*S')} of K. Therefore, Card(Im(P^)) is less than 

Proposition 4. Let E be an alphabet and a be a symbol in Eq. There exist formal tree series Pi and P 2 
such that Im(Pi -a P 2 ) is not finite. 

Proof. Let K = (N,+, x,0,1). Let us consider the alphabet E defined by Eq = {a, 6}, E 2 = {/}. Let us 
consider the tree language L defined by (/(a, &))*“. Let us consider the series Pi and P 2 defined for any tree 
t in Ts as follows: 
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- Pi(i) 

- P2(i) 


1 if i G L, 

0 otherwise; 
1 if t = a, 

0 otherwise; 


Let Ai = {S, Qi, 1 ^ 1 , 61 ) be the RTWA defined by: 


— Qi — 

{0,1}, 



- 1^1 (0) 

= 1, i^i(l) =0 



- Si{a) 

= {0},<5i(&) = 

{l},<5i(/,0,l) 

= {/}• 

Let A 2 = 

(A, (52, 1 ^ 2 , ^ 2 ) 

be the RTWA 

defined by: 

— Q 2 = 

(O), 



- J^2(0) 

= 2, 



- ^ 2 ( 0 ) 

= {0}. 



It can be 

checked that: 




1. the series Pi is realized by the RTWA Ai; 

2. the series P 2 is realized by the RTWA A 2 ; 

3. the series Pi -a P 2 associates any tree t in L with the integer where h(t) is the height of t. 

Since L is infinite, so is Im(PA). According to LemmaO Pi -a P 2 can not be realized by any RWTA. g 

Corollary 2. Let S an alphabet and a be a symbol in Sq. The tree series realized by some RWTAs are not 
closed under a-product. 

The same reasoning can be applied on the case of iterated product. 


3.5 Quotient of a RWTA 

Morphisms of RWTAs can be applied w.r.t. an equivalence relation in order to define quotients of RWTA. 

Given an equivalence relation ^ over a set Q, we denote by the set of equivalence classes of Given 
a state q in Q, we denote by [ 9 ]..., the equivalence class of q w.r.t. i.e. {q' € Q \ q' q}- 

Definition 6. Let A = (S,Q,v,d) be a RWTA and ~ be an equivalence relation over v. The quotient of A 
w.r.t. ^ is the RWTA = [E,Qr^^v',5') defined by: 

- yceQ..., = 

— VCi, . . . , Ck +1 G Ck +1 G S'{f, Cl,..., Cfe) Vi < fc + 1, 3qi G Ci, qk+i G 5(/, gi, . . . , qk) 

Notice that the quotient of a RWTA A does not necessary realize the same series as A. Nevertheless, in 
the following of this paper, we use particular relation that preserves the series while quotienting. 

Definition 7. Let A = (A, Q, v, S) be a RWTA and q be a state in Q. The down language of q is the language 
Lq{A) defined by: 

Lq{A) = {t G Tj; I q G A(t)}. 

Proposition 5. The tree series realized by a RWTA A = (A, Q, v, 6) is equal to J2qeQ 

Proof. By definition, it holds that P^ = Consequently, by definition of i'{A{t)), Fa = 

'^q€A{t) Furthermore, since any tree t such that A(t) is not empty is a tree that belongs to Lq(t) 

for some state q in Q, Pa = X^eQ J2t\q&A{t) Thus, by definition of Lq{A), Fa = J2qGQ Y.t&L,{A) 

Consequently, since the coefficient v{q) belongs to a semiring, by distributivity. Pa = J^q^Q ^{Q)Lq{A). ^ 

Definition 8. Let A = (A, Q, v, S) be a RWTA. Let ^ be an equivalence relation over Q. The relation ^ is 
said to be down compatible with A if for any two states qi and q 2 in Q, it holds: 

9i ^ <?2 Tiji(A) = Lq.^(yA). 


Proposition 6. Let A be a RWTA and ^ be an equivalence relation down compatible with A. Then: 

Pa = Pa.. 

Proof. Let A = {S,Q,v,S) and = {E,Qr.,,v' ,5'). According to Proposition [SJ P^i = '^q^Qv{q)Lq{A) 
and = EceQ^ v'{C)Lc{A). By definition of v' , Pa.. = EceQ^ (SgeC ^(9))-^C'(A). Since -- is down 
compatible, for any state C in for any two states qi and q2 in C, Lq^{A) = Lq.^{A). Therefore Pa~ = 
ScgQ iJZqec Moreover, since ~ is an equivalence relation, any state of Q belongs to one and 

only one state C in Consequently, Pa„^ = J2qec 

Example 5. Let us consider the RWTA A" = A x A' represented in Figure SI Let us consider the equivalence 
relation ^ over the set of states of A" defined by {qi,q[) ^ ( 92 ,9^ ^ Qi =92- It can be shown that ^ is 
down compatible with A. The quotient A" is represented in Figure [SI 
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Fig. 5. The RWTA A" . 


Now that we have defined the notion of RWTA, let us apply it on tree kernel computations. 


4 Subtree Kernel 

In this section, we show how to efficiently compute the subtree kernel of two hnite tree languages using 
RWTAs. We first associate any tree with a RWTA that realizes its subtree series. 


4.1 Subtree Automaton of a Tree 

Definition 9. Let E be an alphabet. Let t be a tree in T^. The subtree automaton associated with t is the 
RWTA At = {E, Q, V, defined by: 

- Q = SubTreeSet(t*), 

- yqeQ, v{q) = I, 

- V/ G Ett, Vti,... ,4+1 G Q, 4+1 G S{h{f),ti, ■ ■ ■ , 4 ) 4+1 = fih, ■ ■ • , 4 )- 

Example 6. Let us consider the tree 4 = f{h{a), f{h{a), b)) defined in ExampleHJ Then t^ = /i(h 2 (a 3 ), f 4 {h 5 {ae), fey)). 
The RWTA Aj^ is represented in Figure El where all the root weights, equal to 1, are not represented. 
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Fig. 6. The RWTA At,. 


Lemma 6. Let E be an alphabet. Let t be a tree in Then: 

P^t = SubTreeSeriest. 


Proof. Let us set At = {E, Q, v, 5) and At, = {E, Qi, vt, Si) for 1 < i < fc. Notice that by definition: 

Q = {t}U Ui<t<fe Q^ and 5 = {(<,/, ti,, tk)} U Ui<t<fe 

Consequently, P^, = t + ■ By definition, SubTreeSeriest = t + J 2 i<j<k SubTreeSeriest,,. Further¬ 

more, by induction hypothesis, P^,. = SubTreeSeriest,. Therefore it holds that 

Pyit = t + J2i<j<k SubTreeSeriest, = SubTreeSeriest. 


Another RWTA can be defined in order to realize the subtree series associated with a tree. This RWTA 
needs less space since its states are exactly its subsets. 


Definition 10. Let E be an alphabet. Let t be a tree in T^- The sequential subtree automaton associated 
with t is the RWTA seq(At) = (E,Q,iy,S) defined by: 


— Q = SubTreeSet(t), 

— Vt' G Q, v{t') = SubTreeSeriest(t'); 

— V/ G r, Vti,..., tk+i G Q, 4+1 G 5 {f, ti,...,tk) ^ 4+1 = /(4, ■ • ■, 4). 


Example 7. Let us consider the tree ti = f {h{a), f {h{a), b)) defined in Example [TJ The RWTA seq(At,) is 
represented in Figure 0 
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Fig. 7. The RWTA seq(Atj^). 


However, the sequential subtree automaton needs the tree series to be known in order to compute it. 
Nevertheless, we show how to compute it from a quotient of the subtree automaton. Once this tree computed, 
it can be reduced using the equivalence '^h- Furthermore, this RWTA is isomorphic to the one obtained by 
subset construction. Consequently, we compute a sequential RWTA the number of states oh which is equal to 
the number of its different subtrees. We first show that ~h is down compatible with the subtree automaton, 
then we show that its application leads to the computation of the sequential subtree automaton. 

Lemma 7. Let S be a graded alphabet. Let t be a tree in Ts- Let At = (E,Q,v,5). For any tree r in 
SubTreeSet(t), it holds: 

A{r) = {r' G Q I h(r') = r}. 

Proof. By induction over the structure of r = /(ri,..., r^). By definition of Zi, Z\(r) = S{f, A{ri),..., A{rk)). 
By induction hypothesis, A{ri) = {r' G Q | r| = rj. Thus, A{r) = {fj{r[,. .. ,r^) G Q | h(/j) = f Art = 
h(r')}. Hence A{r) = {r' G Q \ h(r') = r}. B 

As a direct consequence, for any state r in Q, any tree r' in Lr{At) satisfies h(r) = r'. 

Corollary 3. Let E be a graded alphabet. Let t be a tree in T^. Let At = {E, Q, v, <5). Then for any state r 
in Q, Lr{At) = {h(r)}. 

Lemma 8. Let E be a graded alphabet. Let t be a tree in Ts- Then: 

is down compatible with At. 

Proof. Let At = {E,Q,iy,6). According to Corollary [3l Lr{At) = {h(r)}. Consequently, for any two states 
ri and ra in Q, ri -h ^2 ^ h(ri) = h(r 2 ) ^ Ln©) = L^^^At). ■ 

Proposition 7. Let E be a graded alphabet. Let t be a tree in Ts- Then: 

The RWTA seq(Ht) is isomorphic to At,^^. 

Proof. Let us set At = {E, Q = SubTreeSet(t*), 5), At....^ = {E, Qr^, v', S') and seq(Ht) = (E, SubTreeSet(t), 

By definition, any state C = {ti ,...} in can be associated with h(ti) since any two states q and 
q' in C satisfies by dehnition h(g) = h( 5 '). Consequently, let us consider the function g that associates to 
any state C = {ti ,...,} in Qr..,^ the tree h(ti). Notice that this function is bijective since for any tree r, 
g~^{r) = {r' G SubTreeSet(t^) | h(r') = r}. Let us show that this function defines an isomorphism between 
At..,^ and seq(Ht). 

1. By definition, for any state C in g{f) belongs to SubTreeSet(t). 
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2. For any transition (Ck+i, f,Ci,... ,Ck) in S', there exist by definition in Q and a symbol 

fj satisfying h(fy) = / such that , fy),/, fy,..., tfc) is in 6. Consequently is a 

subtree of and then f(hfti),...,hlfy)) is a subtree of t. Therefore by definition of A (q(Ck+i) = 
Htk+i)J, 9 {Ci) = h(ti),.. .,g{Ck) = h(tfe)) is in 6". 

3. According to Corollary[31 for any state r in Q, Lr{At) = {h(r)}. Consequently, for any state C in 

According to Lemma[3 for any tree r', A(r') = {r | r = r'} = C. Moreover, according 
to Lemma 0 for any tree r', v{A{r')) = SubTreeSeriest(r'). Then fy(C') = SubTreeSeriest((?(C)) = 
^"(5(C)). 


We denote by |t| the size of a tree t, i.e. the number of its nodes. Since the subtree automaton and the 
relation can be computed in linear time, it holds that 

Corollary 4. Let S be an alphabet. Let t be a tree inTs- Then: 

The RWTA seq(At) can be computed in time and space 0(|t|). 

Let us now show that the sequential subtree automaton is a sequential RWTA. Let us prove it by showing 
that the computation of the accessible part of the subset construction leads exactly to the computation of 
the quotient of the subtree automaton, where the accessible part of the sequential RWTA associated with a 
RWTA is the RWTA based on the states the down languages of which is not empty. 

Proposition 8. Let S he an alphabet. Let t be a tree in T^. Then: 

The accessible part of the sequential RWTA associated with At is equal to 

Proof. Let us set At = (S,Q = SubTreeSet(t**), 5), A' = {E, Q', , S') the accessible part of sequential 

RWTA associated with At and A!' = = (A, Q", v"^ 5"). 

According to Lemma|31 A'{r') = {A(r')}. According to Lemma[71 A{r') = {r G SubTreeSet(tl*) | h(r) = 
r'}. Hence, A'[r’) = {{r € SubTreeSet(tl*) | h(r) = r'}}} that is an equivalence class of ~h- Therefore, 
Q" = Q' and 5' = 5". Moreover, for any state C in Q', v'{C) and v"{C) are both equal by definition to 
ScgC Consequently A' = A". B 

Corollary 5. Let E be an alphabet. Let t be a tree in T^. Then: 

The accessible part of the sequential RWTA associated with At is isomorphic to seq(At). 

To sum up the properties of the sequential subtree RWTA associated with a tree: 

Corollary 6. Let E be an alphabet. Let t be a tree in T^- Then the RWTA seq(At).- 

— is a sequential RWTA, 

— is smaller than At, 

— realizes SubTreeSeriest, 

— is constructed in time and space 0(|t|). 

Let us now show how to extend this construction to finite tree languages. 


4.2 Subtree Automaton of a Finite Tree Language 

Let us first define a RWTA recognizing the subtree series of a finite tree languages. 

Definition 11. Let E he an alphabet. Let L be a finite tree language over E. The sequential subtree au¬ 
tomaton associated with L is the RWTA A^ = {E,Q,v,5) defined by: 

— Q = SubTreeSet(L), 

— Vt' G Q, v{t') = SubTreeSeriesL(t'), 

— V/ G A, Vti,..., tk+i G Q, fy+i G <5(/, ti,...,tk) tk+i = f(ti, ...,tk). 

Notice that by definition, for any tree t, seq(Ai) and A^t} are isomorphic. 
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Example 8. Let us consider the trees ti = f {h{a), f {h{a), b)) and t 2 = f{h{a),h{h)) defined in Example [T] 
The RWTA A^n,t 2 } is represented in Figure[8]and realized the series SubTreeSerieSjtj^tj} = t+t 2 +f{h{a),b) + 
3/i(a) + h{b) + 3a + 2b. 



Fig. 8. The RWTA 


Similarly to the case of the sequential subtree RWTA of a tree, the subtree RWTA A of a language needs 
the subtree series to be a priori known. Let us show that A can be computed without knowing the series. In 
order to compute it, we make use of the sum and of the sequentialization, two operations defined in Section[3] 

Lemma 9. Let E be an alphabet. Let L be a finite tree language over E. Let Al = {E, Q, v, 5). For any tree 
r in SubTreeSet(L), it holds: 

A{r) = {r}. 

Proof. By induction over the structure of r = /(ri,..., r^). By definition of Zi, Z\(r) = S{f, A{ri),..., A{rk)). 
By induction hypothesis, A(ri) = {vi}. Thus, A{r) = S{f,ri,... ,rk). Hence A{r) = {r}. g 

As a direct consequence of Lemma [HI Al is sequential. Let us show now that this RWTA can be obtained 
by an inductive sequentialization. 

Proposition 9. Let E be an alphabet. Let Li and L 2 be two distinct finite tree languages over E. Then: 
The RWTA Ar^uL 2 is isomorphic to the accessible part of the seguential RWTA associated with A^^ + A^^. 

Proof. By recurrence over the cardinality of L 2 . 

1. If L 2 is empty, then the proposition is satisfied. 

2. Suppose that L 2 = L 2 U {t} with t ^ L^. Then according to the recurrence hypothesis, the RWTA 
A' = AljUL' = {E^Q',vf5') is isomorphic to the sequential RWTA associated with + A^/. Let 
Ajt} = (A, Qt, t't, dt). A" = A'+ A|(} = {E,Q'',v",5") and A'" = {E,Q"' be the sequential 
RWTA associated with A". By construction, either Q' n SubTreeSet(t) is empty, or the states r of Ap} 
have to be relabelled as r. 

(a) By construction, if Q' n SubTreeSet(t) is empty, it holds from Lemma [7] and from Lemma [9] that the 
construction of the accessible part of the sequential RWTA associated with A" is just a relabelling 
of the states. Furthermore, it can be shown by definition of A" that A" = AljuL 2 - Hence AljuL 2 
isomorphic to the acecessible part of the sequential RWTA associated with A^^ + A^^. 

(b) Otherwise, according to Lemma|3l A"'{r) = {A"(r)}, that equals by construction to the set {At(r)U 
A'{r)}. Hence the states of the accessible part of A"' are SubtreeSet(Li U L 2 ) U SubtreeSet(t) that 
equals by definition to SubtreeSet(Li U L 2 ). Furthermore, for any state r in Q'", 
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( r}) = vt{r) + v'[r) if r S Q' fl SubTreeSet(t), 

v'"{A'"{r)) = v"{A"{r)) = \ v"{{r}) = v'{r) if r S Q' \ SubTreeSet(f), 

i ^"({^}) = ^t ir) if r e SubTreeSet(f) \ Q'. 

Consequently, since A'" is sequential, for any state r in Q'", 

v"'{r) = v'" {A'" {r)) = SubTreeSeriesLjuL^(^) + SubTreeSeriest(r) = SubTreeSeriesLjUL 2 (’’)• 

Finally, since by construction of A'", Wf e S, V{ti},..., {tk+i} £ Q'", {tk+i} £ 5{f, {ti},..., {4}) 
tfc+i = f{ti, ■ ■ ■ ,tfc), it holds that A”' is isomorphic to AliUL 2 - 


Notice that since the complexity of the computation of the accessible part of Al^ + Al^ is equal to 
the size of AliUL 2 i by setting for any tree language L, \L\ = I^Ij ^ can be performed in time equal to 

1^11 + 1 ^ 2 !, and not in an exponential time. Moreover, as a direct consequence of Proposition [1] Proposition^ 
Proposition [5] and Proposition |9] 

Corollary 7. Let E be an alphabet. Let L be a finite tree language over S. Then: 

The RWTA A^ is a seguential RWTA that realizes SubTreeSeriesz,. 

Consequently the subtree automaton A^ associated with a finite tree language L can be computed by 
summing and sequentializing all the ^{t} = At for t in L. Therefore, since the coomputation of the sequential 
subtree RWTA of any tree, the sum and the sequentialization (in this case) can be computed in linear time, 
it holds that, : 

Corollary 8. Let E be an alphabet. Let L be a finite tree language over E. Then: 

The RWTA A^ can he computed in time 0(|L|). 


4.3 Kernel Computation 

In order to compute the subset kernel of two finite tree languages Li and L 2 , we first compute the two 
RWTAs Alj and Then we compute the cartesian product x A^^; Finally we sum all the root 

weight of this RWTA. 

Let us hrst show that our modus operandi is correct: 

Theorem 2. Let E be an alphabet. Let Li and L 2 be two finite tree languages over E. Let (E,Q,iy,S) be 
the accessible part of A^^ x A^j • Then 

KerSeries(Li, L2) = 

Proof. Let us set A^i = {E,Qi,vi,5i) and A^^ = {E,Q2,V2,52)- 
By dehnition of the series product and from Corollary [3 

KerSeries(Li, L2) = XterE(bubTreeSeriesL^ x SubTreeSeriesLj))^) 

= XteTE(bubTreeSeriesL^(t) x SubTreeSerieSi^ (t)) 

= EteTs(^iW X Mt))- 

According to Lemma[9l for any tree t in Tx;, Ai{t) (resp. A 2 {t)) is either equal to {t} if t £ SubTreeSet(Li) 
{t £ SubTreeSet(L 2 )) or to 0. Therefore, according to Lemma 01 A{t) is either equal to {(t,t)} if t £ 
SubTreeSet(Li) nSubTreeSet(L 2 ), 0 otherwise. Moreover, by definition of z/, for any tree t, v{t) = v{A{t)) = 
vi{t) X V 2 {t). Furthermore, by dehnition of Q, f is in Q if and only if t £ SubTreeSet(Li) fl SubTreeSet(L 2 )- 
Consequently, v{Q) = EteSubTreeSet(Li)nSubTreeSet(L 2 ) Since for any tree t, iff ^ SubTreeSet(Li) 

(resp. t ^ SubTreeSet(L 2 )), then i'i{t) = 0 (resp. i' 2 (t) = 0), it holds that 

= XtgTE X V2{f) = KerSeries(Li,L2)- 


Finally, by combining the elemental complexities. 

Theorem 3. Let E be an alphabet. Let Li and L 2 be two finite tree languages over E. Then 

KerSeries(Li,L 2 ) can be computed in time 0(|Li| + IL 2 I + Card(SubTreeSet(Li) n SubTreeSet(L 2 )))- 
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Proof. From Corollary [51 Al^ = {E,Qi,i/i,Si) and = {S,Q 2 ,i^ 2 ,S 2 ) are constructed in time 0(|Li| + 

1^21)- From Lemma m the accessible part of Al^ x A^.^ is composed of the set S of states of the form 
{t,t) with t G SubTreeSet(Li) n SubTreeSet(L 2 )- From Theoremj^ KerSeries(Li, L 2 ) is computed summing 
the root weight of the states in S. Therefore KerSeries(Li,L 2 ) can be computed in time 0(|Li| + IL 2 I + 
Card(SubTreeSet(Li) fl SubTreeSet(L 2 )))- ■ 

Example 9. Let us consider the trees ti = f(h(a), f{h{a),b)), t 2 = f{h{a), h{b)) and ts = f{f{b, h{b)), f(h(a), h{b))) 
defined in ExamplejTJ The RWTA Apjj is represented in Figure|3 The RWTA R = x Apgj is rep¬ 

resented in Figure [TUI The sum of the root weights of R is equal to 15, that is KerSeries({ti, ^ 2 }, {^ 3 })- 



Fig. 9. The RWTA Apjj. 


fl 
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Fig. 10. The RWTA ^ ^{* 3 }- 


5 Conclusion and Perspectives 

In this paper, we defined new weighted tree automata that are always sequentializable but that does not 
realize all the classical recognizable series. We studied the different algebraic combinations of these automata 
(sum, products, regular operations) in order to determine their closures. Once these definitions stated, we 
made use of these new structures in order to compute the subtree kernel of two finite tree series in an efficient 
way. 

Our technique can be applied to other computations. Indeed, other tree kernels exist, like the SST kernel. 
The next step of our work is to apply our constructions in order to efficiently compute these kernels. However, 
this application is not so direct since it seems that the SST series may not be sequentializable w.r.t. a linear 
space complexity. Hence we have to find different techniques, like extension of lookahead determinism m 
for example. 

Another perspective is related to the series realized by RWTAs. It is an open question to determine what 
family they exactly are. 
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